Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets

نویسندگان

  • Shen Lu
  • Richard S. Segall
چکیده

ABSRACT Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large to be useful. For the SOM (SelfOrganizing Map) algorithm, the initial map is based on the training data. In order to avoid the bias caused by the insufficient training data, in this paper we present an algorithm, called Multi-SOM. Multi-SOM builds a number of small self-organizing maps, instead of just one big map. Bayesian decision theory is used to make the final decision among similar neurons on different maps. In this way, we can better ensure that we can get a real random initial weight vector set, the map size is less of consideration and errors tend to average out. In our experiments as applied to microarray datasets which are highly intense data composed of genetic related information, the precision of Multi-SOMs is 10.58% greater than SOMs, and its recall is 11.07% greater than SOMs. Thus, the Multi-SOMs algorithm is practical.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protection of Privacy in Distributed Databases using Clustering

Clustering is the technique which discovers groups over huge amount of data, based on similarities, regardless of their structure (multi-dimensional or two dimensional). We applied an algorithm (DSOM) to cluster distributed datasets, based on self-organizing maps (SOM) and extends this approach presenting a strategy for efficient cluster analysis in distributed databases using SOM and Kmeans. T...

متن کامل

Divorce: An International Multi-dimensional Challenge

  Divorce is one the most important public health problems which may affect many people especially parents, children, and their close relatives. To emphasize the importance of divorce worldwide, the world figures of some countries have been compared. According to transition from a traditional ...

متن کامل

Constructing Two-Dimensional Multi-Wavelet for Solving Two-Dimensional Fredholm Integral Equations

In this paper, a two-dimensional multi-wavelet is constructed in terms of Chebyshev polynomials. The constructed multi-wavelet is an orthonormal basis for space. By discretizing two-dimensional Fredholm integral equation reduce to a algebraic system. The obtained system is solved by the Galerkin method in the subspace of by using two-dimensional multi-wavelet bases. Because the bases of subs...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

A hybrid filter-based feature selection method via hesitant fuzzy and rough sets concepts

High dimensional microarray datasets are difficult to classify since they have many features with small number ofinstances and imbalanced distribution of classes. This paper proposes a filter-based feature selection method to improvethe classification performance of microarray datasets by selecting the significant features. Combining the concepts ofrough sets, weighted rough set, fuzzy rough se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013